Research on robust fault-tolerant control of the controllable suspension based on knowledge-data fusion driven

For the robust fault-tolerant control of the controllable suspension system, a control strategy driven by knowledge-data fusion is proposed. Firstly, the boundary fuzziness between perturbation type uncertainty and gain type fault is analyzed, and then a data-driven method is introduced to avoid the state estimation of system uncertainty and fault. The proximal policy optimization algorithm in reinforcement learning is selected to construct a “data control law”, to deal with uncertainty and fault. On the other hand, based on the classical sky-hook control, the “knowledge control law” for system performance optimization is designed, taking into account the nonlinear and non-stationary characteristics of the system. Furthermore, the dependency between robust fault tolerance and performance optimization control is revealed, and the two control laws are fused by numerical multiplication, to realize the performance matching optimization control of robust fault tolerance of controllable suspension system driven by knowledge-data fusion. Finally, the effectiveness and feasibility of the proposed method are verified by the simulation and real-time experiment of non-stationary excitation and near-stationary excitation under the combination of uncertainty and fault.


Uncertainty and fault analysis of the controllable suspension system
The composition of the controllable suspension system is shown in Fig. 1, which generally includes the sensor unit, controller, actuator (adjustable air spring, adjustable shock absorber) and guide rod, etc.According to whether the actuator can actively generate force, it can be divided into active suspension or semi-active suspension, wherein semi-active suspension can be divided into adjustable damping, adjustable stiffness, adjustable height, adjustable damping, and stiffness, and height according to controllable parameters.In this paper, the adjustable damping suspension system is taken as the object of relevant research.The difference between it and the traditional passive suspension, the first is that its damping force is determined by the speed of the shock absorber and the control current, which results in the superposition of two kinds of nonlinear characteristics.The second is the introduction of the electronic control unit, which increases the probability of uncertainty and fault.The interaction of the above two points increases the challenge of robust fault-tolerant control. www.nature.com/scientificreports/

Nonlinear characteristics of the controllable suspension system
The controllable suspension system is a complex system with nonlinear, time-varying, and uncertain characteristics, and the more comprehensive system characteristics are considered, the more closer to the real suspension state, but the more difficult to analyze and solve.The more simplified the model, the easier to analyze and solve, but the accuracy of the system will be reduced.Therefore, according to the analysis requirement of this subject, which focuses on vertical control of damping adjustable suspension, referring to existing research 15,17,18,28,29,32 , the modeling principle is defined as follows: (1) Suspension bushing stiffness is not considered.(2) The wheel damping is not considered, and its stiffness is defined as linear stiffness.(3) Nonlinearity due to the installation Angle of key suspension components is not considered.Therefore, the two-degreeof-freedom suspension model shown in Fig. 2 is taken as the basic model, as shown in Eq. (1).
where M b is the equivalent body mass, also the sprung mass, the unit is kg.M w is the equivalent wheel mass or unsprung mass, the unit is kg.z b 、z w and z r are the vertical displacements of the body, wheels, and Road, the unit is mm.K s is the suspension stiffness unit N/mm.When the suspension stiffness K s is linear, its value is constant and does not change with the displacement of the suspension wheel jump.When the nonlinear value of K s is taken, its value can be expressed as a function related to displacement, which changes with the travel displacement of the suspension.In the linear segment, its linear value is the same as that of linear K s .The difference is reflected in the position near the jump up and down limit, and its stiffness value will increase rapidly.Whether K s is a linear or nonlinear value, it needs to rely on the analysis condition.If the analysis condition is a flat road with constant speed, the suspension travel displacement is generally within ± 10 mm and has been in the linear section, it is not necessary to define K s as a nonlinear value.However, when the vehicle is driving under nonstationary large excitation, such as repairing asphalt road, manhole cover road, joint road, pothole road, and speed bump, the suspension travel is large, generally exceeding ± 30 mm, so that it enters the nonlinear section of suspension stiffness, K s is required to define as the nonlinear value.C a (I) is the damping coefficient of the adjustable damping suspension system, which is a function of current, and its value changes with the change of current to achieve timely adjustable damping, the unit is N•s/m.It also can be expressed as "linear" controllable and "nonlinear" controllable.For C a (I), whether the linear value or nonlinear value is adopted, also depends on the analysis condition.If the analysis condition is the flat road with constant velocity, the shock absorber velocity is generally less than ± 0.10 m/s, under the valve opening velocity point of the shock absorber (generally 0.10-0.20 m/s), then the damping coefficient before the valve opening point of the shock absorber can be used as a linear damping coefficient, and C a (I) need not be defined as a nonlinear value.However, when the vehicle  www.nature.com/scientificreports/ is running under non-stationary large excitation such as repairing asphalt road, manhole cover road, joint road, pothole road, and speed bump, the shock absorber velocity is relatively large, generally exceeding ± 0.20 m/s, so that it works after the valve opening velocity of the suspension shock absorber.If the linear damping average damping coefficient is assigned, it is insufficient to characterize the damping force value of the shock absorber.Especially in the medium and low-velocity segment (0 ~ ± 0.60 m/s), C a (I) is required to define as a nonlinear value.
As shown in Fig. 3, when the adjustable damping C a (I) takes a linear value, the damping coefficients of both the recovery section and the compression section are an interval value, which generally increases with the increase of the current.Although the damping value is linear, the introduction of the current dimension makes it nonlinear (current adjustable nonlinear).Further, when the damping coefficients C a (I) take a nonlinear value, the nonlinear brought by the velocity dimension and the current dimension are superimposed to form a "variable" nonlinear or "interval" nonlinear, F(I,v) relationship (the nonlinear relationship between damping force and current and velocity), which enhances the nonlinear characteristics of the suspension.It is different from the damping nonlinear F(v) relation defined in the conventional system of passive damping (nonlinear relation of damping force vs. velocity).
According to the above analysis, relevant research on adjustable damping suspension is carried out for nonstationary excitation.Both suspension stiffness and suspension damping should be assigned nonlinear values, as shown in Fig. 4, and relevant vehicle parameters are shown in Table 1.
The non-stationary excitation conditions were constructed with the B-class pavement and speed bump composite time-domain model, as shown in Fig. 5. Equation ( 2) is the time domain model of speed bump, and Eq. ( 3) is the time domain model of B-class pavement, h bump is the speed bump section height, unit is m, generally be 0.04 m. l bump refers to the width of the speed bump, the unit is m, generally be 0.40 m.V el is the velocity of the vehicle, unit m/s.t is the time, the unit is s, z r (t) is the single-wheel road excitation, and the unit is m.f min = n l V el , n l is the lower limit of pavement space cutoff frequency, which is 0.011 m -1 .w(t) is white noise.n 0 is the reference spatial frequency (0.1 m -1 ).G q (n 0 ) is the road roughness coefficient, the unit is m 3 .www.nature.com/scientificreports/Based on the excitation of non-stationary excitation conditions constructed, the simulation results of the damping force and elastic force simulated by the nonlinear nominal model are shown in Figs. 6 and 7.According to the excitation input in Fig. 5, in the excitation section of B-class pavement, the shock absorber velocity is about ± 0.1 m/s, the dynamic travel displacement of the suspension is about ± 10 mm, and the force values are in the near-linear section.When the absolute value of shock absorber velocity exceeds 1 m/s, the absolute value of suspension dynamic travel displacement exceeds 30 mm, and the force values enter the nonlinear section.In accordance with the above analysis, it is further explained that the nonlinear characteristics of the suspension system should be considered when non-stationary excitation conditions with a large impact.

Uncertainty and fault boundary fuzziness of the controllable suspension system
The uncertainty and fault of the controllable suspension system exist in every process of the control system, from input acquisition, and control operation to output execution.Uncertainty is usually divided into time-delay type and perturbation type.The time-delay type includes the sampling time of the sensor unit, control operation time, control current response time of the solenoid valve, and damping force response time of the shock absorber.
The perturbation type uncertainty includes signal perturbation of the sensing unit, the value perturbation of the driving current, and absorber damping force.The fault is divided into gain type, bias type, and stuck type.
As shown in Table 2, the gain fault is manifested by scaling of the sensor unit signal, the value of solenoid valve driving current, and the shock absorber damping force, the offset fault is manifested by the deviation of the above-related values from the normal dynamic balance position, and the stuck fault is manifested by the output of the related values at a constant value.When the uncertainty and fault are transmitted to the output execution along each process of the control system, both intensification and attenuation exist.For example, intensification is reflected in the cumulative superposition of time-delay type uncertainty along the path transmission process from sensor acquisition and control operation to output execution, and the final phenomenon is that the time delay of output execution increase.Attenuation is reflected in the output of adjustable shock absorber damping force, whose damping force value is affected by two perturbations, one is the perturbation of the value of the control current, and the other is the perturbation of the damping force value caused by the design and manufacturing error of the shock absorber.In the superposition fusion process of these two parts, the influence of the control current perturbation will be weakened.Therefore, the uncertainty and fault analysis focuses on the actuator unit of the adjustable damping suspension system, the solenoid valve type adjustable shock absorber, and takes it as the research object.
On the one hand, the output damping force (u uf ) of the solenoid-valve type adjustable shock absorber is determined by the external input excitation (r), internal uncertainty, and fault, as shown in Fig. 8a.Clarify the relationship between uncertainty and fault: First, the perturbation type uncertainty is very similar to the gain fault (Table 2) in the mathematical description.In general, δ represents the perturbation of the damping force value caused by the manufacturing error or the change in the ambient temperature of the shock absorber, δ ∈ [−0.2, 0.2] .σ represents the fault caused by factors such as oil deterioration or oil leakage of the shock absorber or solenoid valve spool displacement deviation, σ ∈ [0, 1] .The perturbation type uncertainty range overlaps with the gain type fault range, and is eventually input to the suspension system through damping force, which is difficult to distinguish.www.nature.com/scientificreports/Second, the occurrence of the fault will be affected by the time delay, perturbation, and external input excitation.It is assumed that the time delay and perturbation are mapped to the fault under the action of the external input excitation, as shown in Fig. 8b.τ, δ and r correspond to one-dimensional coordinate axes respectively, and the corresponding regions are divided into perturbation/minor fault, general fault and serious fault according to the critical values of each axis (τ thr , δ thr , r thr ).Under the premise of a certain cumulative time history, the perturbation/minor fault overlap zone is defined as if the three-dimensional values are all within the critical value, and the general fault is defined as if and only if there is one dimension beyond the critical value, the serious fault is more than two dimensions beyond the critical value, and the state of the perturbation/minor fault, the general fault and the serious fault are gradually transferred over the time history.However, be same as the traditional method, there are still problems with how to demonstrate the rationality of the hypothesis of uncertainty and fault probabilities, and how to obtain the real-time value of the shock absorber damping force.For such "black box" problem, based on qualitative relationship analysis, we can set a new way to avoid online diagnosis and identification of uncertainty and fault.The idea, with the help of the data-driven method, establishes the correlation between the input signal and the control law.The uncertainty such as time delay τ, perturbation δ, and external excitation r are taken into account with the input signal, that is, the fault is fuzzy as a process quantity, without diagnosis and identification.

Robust fault-tolerant control architecture and strategy
Based on the analysis of uncertainty and fault characteristics of the controllable suspension system in the previous section, a new robust fault-tolerant architecture and strategy is proposed.Based on the principle of "white box problem be controlled by knowledge-driven, black box problem controlled by data-driven", the robust faulttolerant control law of uncertain systems is divided into "knowledge control law" and "data control law".The knowledge control law is designed with empirical rules to ensure the interpretability of the control law.The data control law is designed with the data-driven method, and the direct correlation between the input signal and control output containing uncertainty is established to avoid the uncertainty and fault state estimation of the system.It involves the delineation method and fusion mechanism of the two types of control laws, and the clarification of strategy realization.

Knowledge-data fusion driven control for robust fault-tolerant architecture
Based on the boundary fuzziness analysis of perturbation type uncertainty and gain type fault of the controllable suspension system, a data-driven method is introduced to avoid the state estimation of system uncertainty and fault, and a robust fault-tolerant "data control law" is constructed.A "knowledge control law" for system performance optimization is designed based on the knowledge-driven method, taking into account the system's nonlinearity and non-stationary characteristics.The fusion mechanism of the two control laws lies in the dependency between robust fault tolerance and performance optimization control, which can be started from the analysis of the damping force characteristics of the shock absorber in the adjustable damping suspension system.First, the nominal damping force u d of the adjustable shock absorber is defined, in Eq. ( 4). ( 4) www.nature.com/scientificreports/where F( ) is a function of the velocity and control current of the shock absorber as input, the unit is N.The symbol I is the control current, the unit is A. v is the velocity of the shock absorber, and the unit is m/s, determined by the external input excitation.Further, on the basis of Eq. ( 4), u ud is the damping force of the shock absorber with uncertainty, including the time delay type and the perturbation type.The time delay type is inserted through the control current I.And the perturbation type is reflected in the final output damping force, see Eq. (5).
At the same time, the fault expression in Table 2 is combined into the additive and multiplicative characteristic fault shown in Eq. ( 6).This expression defines the classification of the fault under the ideal assumed state, while the actual fault is time-varying, including the three ideal classification states, and the three are dynamically transformed.Here, it is more willing to unify them as the time-varying gain type σ (t) except the stuck fault.Then the fault expression is converted into the multiplicative characteristic fault shown in Eq. ( 7), which can include the gain type and the bias type, and can also represent the dynamic change between them.
By combining the Eqs.( 5) and ( 7), the damping force expression of the shock absorber including uncertainty and fault can be obtained: where u uf is the adjustable damping force of the shock absorber including uncertainty and fault, the unit is N.
Based on Eq. ( 8), the influence relationship of uncertainty and fault on the damping force of the adjustable shock absorber can be obtained, as shown in Fig. 9.
Based on Fig. 9a, the influence of time-delay type uncertainty is analyzed, assuming that the expected damping force of the control system is F 1 , when the control current I(t) does not delay output of τ (t) , the velocity of the shock absorber at time t is v(t) , corresponding to point ① in the figure, the damping force value is exactly F 1 .However, there is a delay τ (t) in the actual system, then at the time t + τ (t) , the shock absorber velocity is v(t + τ (t)) , corresponding to point ② in the figure, and the damping force is F 2 , which deviates from the expectation.In order to make the damping force at t + τ (t) time output by F 1 , at point ③ in the figure, then the damping force velocity curve F(I(t + τ (t)), v(t)) with the time delay can be scaled to output F 1 , and the time delay type uncertainty can be compensated by means of product.
At the same time, from Fig. 9b,c, when perturbation type uncertainty or gain type fault occurs, it can be compensated by a certain proportion of scaled damping force.At all, the uncertainty and fault of adjustable damping suspension systems are uniformly described as "multiplicative characteristics", which can be compensated by proportional multiplicataion based on performance optimization control.Specifically, the performance optimization output of the system is driven by knowledge, and the value of the control current I is determined.Compensating the influence of τ (t) , δ(t) , and σ (t) with data driven, and the scaling ratio of the control current I is determined.
Based on this, knowledge driven control and data driven control fusion through multiplication, the knowledge-data fusion driven robust fault-tolerant control architecture for the controllable suspension system is constructed, as shown in Fig. 10.This architecture is composed of two parts: knowledge-driven and data-driven, and the knowledge-driven control law C kd (knowledge control law) is used to deal with suspension system performance optimization.The control law C dd (data control law) is a data-driven control law that deals with uncertainty and fault in suspension control and deals with the black box problem of uncertainty and fault.The final control instruction C relies on the "multiplicative feature" of uncertainty and fault in Eq. ( 8), and multiplies C kd and C dd numerically.The "state" is the state variable of the controllable suspension system, and the "control" is a robust fault-tolerant controller. (5) Velocity (m/s) www.nature.com/scientificreports/ The "state" includes body vertical acceleration zb , suspension relative velocity żb − żw , suspension dynamic travel displacement z b− z w , etc. Knowledge driven consists of "Input" (input), "ISH" (modified sky-hook strategy), and C kd (output)."Input" includes suspension relative velocity and body velocity from "state", in which body velocity is obtained based on the sprung vertical acceleration of the body by Kalman filter 29 .Data-driven consists of "Observation"(observation input), "Reward"(reward and punishment rule), "RL"(Reinforcement Learning), and C dd (output).The "Observation" includes relative velocity żb − żw , suspension dynamic travel displacement z b− z w , and robust fault-tolerant control law C, and the "Observation" includes time delay, perturbation, and external excitation uncertainty information, which be analyzed in the II.B.The "Reward" consists of two parts, one is based on the performance index of the controllable suspension system, and the other considers the feedback effect of the robust fault-tolerant control law C.

Robust fault-tolerant control strategy driven by knowledge and data
Based on the knowledge-data fusion driven robust fault-tolerant control architecture of controllable suspension system, design a knowledge control law deal with performance optimization and a data control law deal with uncertainty and fault, to realize robust fault-tolerant control of controllable suspension system, and improve the comprehensive ride comfort of the vehicle under the influence of uncertainty and fault.
The main performance indexes of the suspension system include body vertical acceleration, suspension dynamic travel displacement, and wheel dynamic load.And the body vertical acceleration is an important basis for evaluating ride comfort and reducing its amplitude is conducive to improving vehicle comfort.The suspension dynamic travel displacement is related to its limit travel, and too large dynamic travel displacement will lead to the phenomenon of impacting the limit block, reducing the dynamic travel displacement and the frequency of impacting the limit block, which is conducive to improving the vehicle comfort.The dynamic load between the wheel and the road directly affects the adhesion effect between the wheel and the ground, which is related to the vehicle handling stability, reducing the dynamic load of the tire within a certain range is conducive to improving the vehicle handling stability.
The three performance indexes are all the smaller, the better, as shown in Eqs. ( 9), (1).where zb is the vertical acceleration of the body, z b− z w is the dynamic travel displacement of the suspension, and zw is the acceleration of the wheel.Because it is not easy to obtain the z w − z r of the dynamic load K t (z w − z r ) of the wheel in practice.Therefore, zw is used to represent the influence of wheel dynamic load change.Based on the dynamic characteristics of the suspension system, it can be seen that the three performance indexes influence and restrict each other.
Further, the "fusion reward function" R e driven by knowledge-data fusion, consists of two parts, one is based on the performance indexes of the controllable suspension system, and the weight relationship between the performance indexes is considered.The other one considers the feedback effect of robust fault-tolerant control law C.
where R p is the comprehensive reward of suspension system performance index, Eqs. ( 11)- (15).R c is the boundary constraint reward of robust fault-tolerant control law C, Eq. ( 16).www.nature.com/scientificreports/where ω j is the weight coefficient of each performance index of the suspension system, which is determined based on the order relation method, as shown in Eq. ( 12), where the rational value r k is assigned according to expert experience, and the value r 2 is 1.4, indicating that R 1 is significantly more important than R 2 .The value of r 3 is 1.2 means that R 2 is slightly more important than R 3 .Then the weight coefficients ω 1 , ω 2 and ω 3 are 0.4430, 0.3093 and 0.2577 respectively.
Where R 1 is represented by the negative square value of the body vertical acceleration, and the greater the value, the greater the reward, that is, the lower the penalty.R 2 triggers different reward values according to the conditions, when the suspension dynamic travel displacement is within the range of expected dynamic travel displacement, the reward is set to 0. Outside this condition, it is represented by the negative square value of the suspension dynamic travel displacement, and the greater the value, the greater the reward, that is, the lower the penalty.R 3 triggers the setting of different reward values according to the conditions, when the dynamic load of the wheel does not exceed the static load of the wheel, the reward is set to 0. Outside of this condition, the reward is set to − 50, that is, a fixed penalty is given.R c sets the reward trigger condition according to the constraint boundary of the control law, that is, the robust fault-tolerant control law cannot exceed the feasible region of the control current of the adjustable shock absorber, which is [0.3, 1.6].When the control law is within the feasible region, the reward is set to 0, outside this condition, the reward is set to − 50, that is, a large fixed value penalty is given to avoid it being outside the feasible region.

1) Knowledge control law.
The knowledge control law is modified based on the classical SH (Sky-Hook) control strategy 15 .The on-off SH control law C a (I) is: www.nature.com/scientificreports/ Figure 11, the body vertical velocity żb is the vertical coordinate, and the shock absorber velocity żb − żw is the horizontal coordinate, which is divided into four quadrants.In the first quadrant, żb and żb − żw are both positive, including two motion cases.One is that the body and the wheel move upward together, and the moving velocity of the body is greater than the wheel.Second, the body moves up but the wheel moves down.In this case, the larger C a (I) is expected from the control requirement, the better, C max is taken.In the second quadrant, żb and żb − żw are both positive, the body and wheel move upward together, but the body velocity is smaller than the wheel.In this case, the smaller C a (I) is expected from the control requirement, C min is taken.Similarly, the distribution of the third and fourth quadrants is similar.
Based on the analysis of the SH control strategy, it can be concluded that it has a certain conservative passive robust fault tolerance.For example, to ensure adaptability to working conditions, the optimal performance is often sacrificed, so that the control law is according to the current adjustable shock absorber boundary capability C max and C min .When the adjustable shock absorber has perturbation type uncertainty and gain type fault, its damping force value will attenuate.At this time, the control output of the damping force of the shock absorber has reached the adjustable upper limit, and there is no space to increase the control output, to compensate for the corresponding damping force attenuation.Therefore, it can be understood that a certain passive robust fault tolerance has been achieved under the current uncertainty and fault condition.
Further, combined with Fig. 4b and Eq. ( 8), C max and C min are converted into current I max and I min , that is, different currents have different damping coefficient values, and a Map is constructed with the control current and the shock absorber velocity as inputs, and the damping force of the shock absorber as output.If the values of I max and I min are both constant, the damping control process will always switch repeatedly between the two boundary current states, which is similar to the traditional SH control, and the robust fault tolerance is limited.Therefore, on the premise that I min is the lower limit of adjustable current, an interpolation mapping relationship between I max and żb is constructed, as shown in Fig. 12, so that I max is correlated with the vibration intensity of the body acceleration, which is related to the change of excitation energy, and I max increases monotonically with the vibration intensity.And update the ISH control law, Eq. ( 18).In this way, it can fully use the adjustable damping force range of the shock absorber, realize the multi-state continuous adjustment of the damping force, improve the performance of the system when there is no fault, and also preset the adjustable space for robust fault-tolerant control.
2) Data control law.Reinforcement learning is a learning method of machine learning, which is a self-learning decision-making method that imitates the learning behavior of animals (including humans).Research has shown that animals (including humans) learn by constantly exploring trial and error, repeating behaviors that bring rewards as much as possible, and avoiding punitive behaviors as much as possible.With the same mindset, reinforcement learning takes action based on feedback from the environment.Through continuous interaction and trial and error with the environment, rewards or punishments are given based on quantitative feedback, ultimately achieving specific goals or maximizing the overall benefits of action.Reinforcement learning means learning "what can be done to maximize the numerical benefit signal 30 ." Based on reinforcement learning does not require labels for training www.nature.com/scientificreports/data, can explore unknown fields, and has real-time decision-making ability when interacting with the simulation environment 31,32 , choosing it as the data-driven approach.And select the Proximal Policy Optimization (PPO) algorithm, PPO is a model-free, online, strategy gradient reinforcement learning method that alternates between interactively sampling data through a simulation environment and optimizing tailored proxy objective functions using stochastic gradient descent.The tailored proxy objective function improves training stability by limiting the size of the policy change at each step.The action space can be discrete or continuous, the action space of this subject is continuous.It is based on the actor-critic framework, which is a framework based on the actionvalue function.Actor learns strategy function π and Critic learns action value function V.The algorithm steps are shown in Table 3.

Demonstrative example
Based on the knowledge-data fusion driven robust fault-tolerant control architecture and strategy proposed in Section "Robust fault-tolerant control architecture and strategy", and combined with Eqs. ( 1), ( 2), (3), and ( 8), an interactive simulation model can be constructed taking into account the nonlinear, non-stationary, and uncertainty of the controllable suspension system, as shown in Fig. 13.The "control" in the figure corresponds to Fig. 10.And set the initial parameters of the training interactive simulation environment as shown in Table 4.
The key parameters of the PPO algorithm are set as follows: the discount factor γ is 0.95, the learning rate α is 0.001.Actor-Critic network architecture is shown in Fig. 14.The input of Actor network is the Observation in the previous section.The number of hidden layers is 3, and the number of nodes is 256.The output is defined as the mean and standard deviation of continuous Gaussian probabilities ActorMean and ActorStd.The activation Table 3. PPO algorithm steps.
1. Initialize the actor π(A|S, θ) with random parameter values θ 2. Initialize the critic V (S, ∅) with random parameter values ϕ 3. Generate N experiences by following the current policy.The experience sequence is: For each episode step t = t s+1 , t s+2 , • • • , t s+N , compute the return and advantage function Compute the advantage function D t , which is the discounted sum of temporal difference errors:

Training result analysis
Reinforcement learning training is conducted based on the above Settings, and the training iteration process is shown in Fig. 15.The total Episode of iteration is set to 1000, and convergence occurs after about 340 rounds of PPOSH control (knowledge-data driven) iteration, and 600 rounds of PPO control (data driven) iteration.It can be seen that the knowledge-data fusion driven control method can improve the learning convergence speed and shorten the training time.
After the training, the PPOSH control is compared with the SH control (knowledge driven) and the PPO control(data driven).Considering that the defined non-stationary excitation condition is constructed with the B-class road surface and speed bump composite time-domain model, it includes pulse excitation and random excitation.Due to the different processing methods of pulse excitation and random excitation data, the simulation data is divided into two sections according to the definition of road excitation.0-2 s is mainly defined as the pulse road excitation segment, which is dominated by the speed bump excitation.2-5 s is mainly B-class road excitation, which is defined as the random B-class road excitation segment.Figure 16 shows the time-domain data results of the pulse road excitation section, in which the excitation energy is large, the response value of vehicle body acceleration is also large, and it presents a transient mutation characteristic.In this case, emphasis is placed on strengthening vibration transmission attenuation, that is targeted reduction of vehicle body acceleration.The results show that the acceleration of the vehicle body is improved obviously when the wheel acceleration and suspension dynamic travel displacement are not significantly different.Figure 17 shows the results of wavelet transform time-frequency analysis of vehicle body acceleration.It can be intuitively seen that in the speed bump excitation time history, the vibration energy distribution is ordered as SH control > PPO control > PPOSH control in the range of 0-20 Hz, and the smaller the vibration energy is, the better the vibration suppression effect is.   www.nature.com/scientificreports/ Figure 18 shows the time-domain data results of the random road excitation section.The excitation energy of this section is small, and the response value of vehicle body acceleration is also small.In this case, the emphasis is placed on strengthening the grounding control of the wheel, and the wheel acceleration can be reduced effectively.
Figure 19 shows the Fourier transform frequency domain analysis results of vehicle body acceleration and wheel acceleration.It can be intuitively seen that wheel vibration amplitude is significantly suppressed in the range of 0-20 Hz in the comfort concern band, during the random B-class road excitation period.
Compare the control law of SH, PPO, and PPOSH, and take the data of 1-2 s in the pulse road excitation segment as an example, as shown in Figs.20 and 21.As shown in Fig. 20a, the SH control law switches between the defined minimum current of 0.3A and the maximum current of 1.6A, according to the rules.This mode is a conservative control mode.Regardless of whether the adjustable shock absorber has fault attenuation of www.nature.com/scientificreports/damping force, the output is according to the upper limit capacity of the shock absorber, and has certain conservative robust fault tolerance.As shown in Fig. 20b, the PPO control law switches on demand within the range of adjustable drive current and has a certain adaptive adjustment ability, but it is not strongly related to the excitation condition.As shown in Fig. 21a, the road excitation during the 1-2 s period mainly enters the pulse excitation from 1 s and enters the B-class random road excitation after a short impact.There is no obvious difference between the upper and lower limit of the control law transformation in the pulse excitation segment and the random excitation segment.But as shown in Fig. 20c, the control law of PPOSH can be adjusted according to the intensity of excitation energy in the range of adjustable current, and it has obvious following in the control law of pulse excitation segment and random excitation segment.It is attributed to the Cddand Ckd of PPOSH control law, as shown in Figs.21b,c, wherein Ckd is a knowledge control law defined by Eq. ( 18), which has strong theoretical support.Cdd is a data control law learned based on the PPO strategy, which can obtain proportional compensation for uncertainty and fault in a fuzzy way.
Concretely, Fig. 20c is obtained by multiplying the values of (b) and (c) in Fig. 21, where the knowledge control law Ckd is based on the dynamic characteristics of the suspension system and has strong interpretability.With the help of data-driven fuzzy mapping capability, data control law Cdd can directly establish a direct correlation between input signal containing uncertainty and control output, obtain proportional compensation for the impact of uncertainty and fault, avoid the process of diagnosis and identification of uncertainty and fault, realize no-estimator control, and simplify the control loop.
Compare the adjustable damping force distribution of SH, PPO, and PPOSH, and take the data of 0-5 s as an example, as shown in Fig. 22.The order of the utilization rate of each control strategy for the adjustable damping force is PPOSH control > PPO control > SH control.The higher the utilization rate of the damping force is, the better the adaptability of the control strategy in the adjustable range of the damping force.

Robust fault-tolerant control effect of the suspension system
To evaluate the robust fault tolerance of the proposed knowledge-data-driven control strategy, that is the comprehensive control effect evaluation under various combinations of uncertainty and fault.Uncertainty and fault are defined in Table 5 and combined, T f = 5 s, T s = 0.01s.The statistics of each quantitative evaluation index are shown in Tables 6 and 7. Table 6 shows the data statistics of the impulse road excitation input segment.In the case of large and non-stationary road excitation energy, the time-domain peak values of vehicle acceleration under different combinations of uncertainty and fault all show that PPO control and PPOSH control are better than SH control, with an increase of more than 12.97%, and the wheel acceleration in time-domain and dynamic travel displacement in the time-domain increase or decrease very little, within 4.22%.Compared with PPO control, www.nature.com/scientificreports/PPOSH control shows obvious advantages in 3 of the 4 combinations, and only the BD combination shows similar advantages.Table 7 shows the data statistics of the random B-class road excitation input segment.When the road excitation energy is small and approximately stable, PPO control and PPOSH control pay more attention to maintaining a clear road sense than SH control.The wheel acceleration time domain root mean square value of PPO control and PPOSH combined with different uncertainty and fault are better than SH control, with an increase of more than 15.37%.Due to the mutual restrictive relationship between vehicle acceleration and wheel acceleration index, the RMS value of vehicle acceleration and dynamic travel displacement in the time domain is increased, and the increase is within 9.43%.Compared with SH control, it can balance the control of body acceleration and wheel acceleration when the value of body acceleration is not large (about 0.3000 m/s 2 ).The PPOSH control optimizes wheel acceleration by the same amount as the PPO control in all four combinations.Further, the performance index in Tables 6 and 7 are normalized to obtain the relative value of the three performance indexes (body acceleration, wheel acceleration, and suspension dynamic travel displacement) of SH control, PPO control, and PPOSH control under the combinations of uncertainty and fault, the lower the value, the better the control effect.And the weight coefficient ω i determined by reference to Eq. ( 15), and the three performance indexes are weighted to obtain the normalized index P, Eq. ( 19).As shown in Figs.23a and  www.nature.com/scientificreports/where nor( ) is a normalized function, P i is each performance index, i = 1, 2, 3, the value is the time domain peak value in the 0-2 s of pulse road excitation segment, and the value is the root mean square value in the 2-5 s of B-class road excitation segment.
where P G j is a normalized index under the combination of uncertainty and fault, j = 1, 2, 3, 4, N = 4.

Real-time experiment
From the perspective of the demand for control methods in actual vehicle applications, the effectiveness and feasibility of a set of control methods are mainly reflected in two aspects: firstly, the control methods can bring performance improvement; secondly, the control method can be deployed for real-time system operation.In this section, an experiment environment will be built based on real-time systems and actuator driver modules to further verify the feasibility of proposing control methods.As shown in Fig. 25, the real-time testing environment consists of a Host (H), Real-time system (RTS), Actor drive module (ADM), and Shock absorber solenoid valve (SASV).H communicates with RTS through ethernet, RTS is connected to ADM through CAN, and ADM and SASV are directly connected through connectors.H serves as the terminal for policy compilation, download, and data export.The compiled control strategy runs in RTS, and the system control load is implemented through ADM and SASV.The type, main parameters, and interface forms of each part are annotated in Fig. 25.Based on this real-time experiment environment, conduct testing and analysis of SH, PPO and PPOSH control.
The relevant time-domain data is shown in Fig. 26, in the pulse road excitation section (near the impact peak 1.0-1.1 s), the acceleration of the vehicle body is improved obviously when the wheel acceleration and suspension dynamic travel displacement are not significantly different; in the random road excitation section (taking 2.0-2.5 s as an example), the response value of vehicle body acceleration is low duo to the small excitation energy, the emphasis is placed on strengthening the grounding control of the wheel, and the wheel acceleration can be reduced effectively.Which is consistent with the simulation conclusions described in section "Training result analysis" and "Robust fault-tolerant control effect of the suspension system".At the same time, the code generation setps including reinforcement learning by MATLAB are shown in Table 9.After the code generation, we can obtain the lines of code for each control method is 578 (SH), 42,824 (PPO), and 44,115 (PPOSH), respectively.It can be found that the knowledge data fusion-driven method proposed in this article, due to the introduction of reinforcement learning, leads to a surge in code size compared to classical control methods.However, fortunately, with the development of hardware technology, resources such as real-time systems can meet the demand for a surge in code size, making the application of knowledge data fusion-driven methods possible.This proves the feasibility of generating and deploying reinforcement learning algorithm code, and the real-time experiment results further validate the effectiveness of the proposed control method.

Figure 5 .
Figure 5.Time domain model of B-class pavement and speed bump.

Figure 6 .
Figure 6.The damping force distribution of nonlinear nominal model simulation results: (a) B-class pavement excitation segment-nearly linear, and (b) Speed bump excitation segment-nonlinear.

Figure 7 .
Figure 7.The elastic force distribution of nonlinear nominal model simulation results: (a) B-class pavement excitation segment-nearly linear, and (b) Speed bump excitation segment-nonlinear.

Figure 8 .
Figure 8.The relationship: (a) the output damping force with uncertainty and fault, and (b) uncertainty and fault sets.

Figure 9 .
Figure 9. Adjustable damper force-velocity curve with uncertainty and fault: (a) time delay type uncertainty, (b) perturbation type uncertainty, and (c) gain type fault.

Figure 12 .
Figure 12.Relationship between I max and vibration intensity.

5 .
Learn from mini-batches of experiences over K epochs a. Sample a random mini-batch data set of size M from the current set of experiences b.Update the critic parameters by minimizing the loss L critic across all sampled mini-batch data L critic (∅) = 1 M M i=1 (G i − V (S i , ∅)) 2 c.Normalize the advantage values D i based on recent unnormalized advantage values D i ← Di −mean(D1,D2,••• ,DM ) std(D1,D2,••• ,DM ) d. Update the actor parameters by minimizing the actor loss function L actor across all sampled mini-batch data L actor

6 .Figure 13 .
Figure 13.Nonlinear non-stationary uncertain interactive simulation model of the controllable suspension system.

Figure 17 .
Figure 17.Time-frequency domain data of body acceleration in pulse road excitation section: (a) SH control, (b) PPO control, and (c) PPOSH control.

Figure 18 .Figure 19 .Figure 20 .
Figure 18.Comparison of time domain data in B-class road excitation section: body acceleration, (b) wheel acceleration, and (c) suspension dynamic displacement.

Figure 21 .Figure 22 .
Figure 21.The road excitation and control law: (a) road excitation z r , (b) knowledge control law Ckd, and (c) data control law Cdd.

Figure 26 .
Figure 26.Comparison of time domain data in pulse road excitation section and B-class road excitation section: (a) body acceleration, (b) wheel acceleration, (c) suspension dynamic displacement, and (d) control law.

1 .
Prepare the trained agent.matnetwork by reinforcement learning (RL) 2. Load the agent.matand generate evaluatePolicy.mby generatePolicyFunction() 3. Generate C + + code from the evaluatePolicy.mpolicy script file through codegen() 4. Integrate the C + + code into S-funciton through the Legacy code tool 5. Integrate RL S-function and ISH to obtain PPOSH control, compile and complete code generation

Table 1 .
Vehicle parameter list.

Table 2 .
Uncertainty and fault table.*In the table, τ is the delay, δ is the perturbation, σ is the fault constant gain, ρ is the fault constant bias.

Table 4 .
Initial parameters of the interactive simulation environment.

Table 5 .
Uncertainty and fault definition.

Table 6 .
Pulse road excitation input segment (0-2s).*In the table, P 1 is the acceleration of the body, P 2 is the dynamic travel displacment of the suspension, and P 3 is the acceleration of the wheel.The parameters in Table7are defined in the same way.

Table 9 .
The code generation setps including reinforcement learning.